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FOREWORD 


This  report  was  prepared  for  the  Statistics  Research  Team,  Flight  Research 
Laboratory,  Wright  Air  Development  Center  by  Dr.  Paul  R.  Rider,  the  project  engineer 
under  RDO  No.  460-51,  The  Statistical  Analysis  of  Ranked  Data. 
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ABSTRACT 


Explanations  are  given  of  what  is  meant  by  ranked  data.  Questions  of 
rank  correlation  and  concordance  are  discussed.  Coefficients  of  correla¬ 
tion  and  concordance  are  defined  and  methods  of  testing  them  for  signifi  - 
cance  are  described  and  illustrated. 
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INTRODUCTION 

Many  of  the  problems  about  which  the  Statistics  Research  Team  is  consulted  involve 
ranked  data.  These  problems  have  to  do  with  such  diverse  things  as  photographic  film, 
binoculars  used  in  reconnaissance,  coffee  tasting,  recipe  testing,  food  preferences,  and 
the  findings  of  officers’  rating  and  promotion  boards. 

The  question  of  how  to  handle  ranked  statistical  data  therefore  seems  of  sufficient 
importance  to  warrant  the  publication  of  a  Technical  Report  on  the  subject. 

The  object  of  this  report  is  to  discuss  and  explain,  in  as  elementary  a  manner  as  is 
possible  with  material  that  is  somewhat  technical,  methods  of  analysis  that  are  custom¬ 
arily  employed  in  dealing  with  ranked  data.  The  questions  of  rank  correlation  and  of 
concordance  of  judgment  in  ranking  are  discussed.  In  particular,  methods  are  given  for 
testing  whether  correlation  or  concordance  that  has  been  found  is  significant. 
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SECTION  I 
RANK  CORRELATION 


1.  Bank.  To  rank  a  set  of  objects  is  to  arrange  them  in  order  with  respect  to 
some  characteristic.  The  set  of  objects  could,  for  example,  be  a  group  of  men  and  the 
characteristic  could  be  height.  When  the  men  are  arranged  in  order  of  height,  the 
tallest  is  assigned  the  rank  1,  the  next  tallest  the  rank  2,  and  so  on.  In  this  case 
the  characteristic  is  a  measurable  one,  and  the  ranking  is  merely  a  transformation  of 
variables. 

There  is  usually  a  distortion  in  such  a  transformation.  Thus,  consider  four  men 
whose  heights  are  6  feet,  2  inches;  6  feet;  5  feet,  11  inches;  and  5  feet,  7  inches 
respectively.  The  differences  in  height  between  consecutive  men  are  2  inches,  1  inch 
and  4  inches.  Yet  the  difference  in  rank  between  any  two  consecutive  men  is  1.  (It 

is  assumed  for  the  present  that  there  are  no  ties  in  rank. ) 

It  is  seen  that  in  the  case  of  a  measurable  characteristic,  rank  is  a  rather  rough 
way  of  assigning  a  numerical  value  to  the  degree  in  which  the  characteristic  is  pos¬ 
sessed.  However,  there  are  certain  advantages  in  using  ranks.  One  of  these  is  that  the 
numbers  involved  in  statistical  computations  and  analyses  are  usually  simpler.  Another 
is  that  sometimes  a  set  of  numerical  data  will  be  dominated  by  one  or  two  large  items, 
whereas  if  the  items  are  ranked  the  undue  influence  of  these  items  is  eliminated.  (See 
Kendall,  Rank  Correlation,  pp.  14-15.) 

It  is  frequently  possible  to  rank  objects  according  to  some  characteristic  which 
is  difficult  or  even  impossible  to  measure.  Individuals  can  be  ranked  according  to 
intelligence  or  personality,  manufactured  articles  can  be  ranked  according  to  beauty  of 
design,  aircraft  can  be  ranked  according  to  performance  or  efficiency.  Some  of  the 
characteristics  just  mentioned  are  too  vague  to  allow  of  measurement,  yet  they  do  permit 
rank  ing. 

2.  Measuring  Ability  to  Rank  Correctly.  We  may  at  times  want  to  measure  the  abil¬ 
ity  of  an  individual  to  make  judgments  of  a  certain  type  by  ranking  a  set  of  objects.  For 
example,  suppose  that  there  are  four  objects  of  the  same  size  and  shape  but  of  different 
weights  and  that  a  person  attempts  to  rank  them.  If  his  ability  to  arrange  them  in  the 
correct  order  is  to  be  measured,  it  would  seem  natural  that  he  should  receive  the  highest 
possible  score  if  he  ranks  them  in  the  correct  order  and  the  lowest  possible  score  if  he 
arranges  them  in  the  reverse  order.  Any  other  ordering  (that  is,  ranking)  should  give 
him  an  intermediate  score. 

In  order  to  develop  a  measure  of  ranking  ability  let  us  consider  a  concrete  case. 
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Suppose  that  a  set  of  four  objects  has  been  placed  in  the  order  2314  instead  of  the 

correct  or  natural  order  1234.  We  consider  the  number  of  pairs  of  ranks  in  the  actual 

ranking  which  are  in  natural  order  and  the  number  of  pairs  of  ranks  which  are  in  in¬ 
verted  order.  Let  us  score  the  pairs  as  in  Table  1.  When  a  pair  of  ranks  is  in  the 
natural  order,  we  place  a  1  in  the  natural  order  column,  labeled  P;  when  a  pair  of 
ranks  is  in  the  inverted  order,  we  place  a  1  in  the  inverted  order  column  labeled  Q. 

In  the  score  column,  labeled  ST  (the  subscript  is  used  to  avoid  confusion  with  another 

•S  which  will  be  used  later),  we  place  a  +  1  for  each  1  in  the  P  column  and  a  ~  1  for 

each  1  in  the  Q  column.  It  follows  that 

Sr  =  P  -Q.  (1) 

It  is  not  necessary  to  construct  a  table.  It  has  been  constructed  here  for  the 
purpose  of  explaining  the  method  of  scoring.  When  this  method  is  understood,  the 
value  of  P  or  Q  or  ST  can  be  found  quite  quickly  after  a  slight  amount  of  practice. 


TABLE  1 


Pair 

Natural 

Inverted 

of 

order 

order 

Score 

ranks 

P 

Q 

23 

1 

+  1 

21 

l 

-  1 

24 

1 

+  1 

31 

l 

-  1 

34 

1 

+  1 

14 

1 

+  1 

Total 

4 

2 

2 

The  following  fact  should  be  noted,  as  it  is  very  helpful  in  calculating  the 
score.  If  n  is  the  number  of  objects  ranked,  then  the  number  of  pairs  of  ranks  is 
(n  -  1).  Consequently, 

P  +  Q  =  Vzn  (n  -  1)  (2) 

and  if  either  P  or  Q  has  been  found  then  the  other  can  be  found  at  once  as  can  Sr. 
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Often  the  easiest  procedure  is  first  to  find  Q  by  counting  the  number  of  inverted  pairs. 

3.  Measuring  Agreement  between  Two  Rankings.  The  preceding  method  can  be  used  to 
measure  the  agreement  of  two  rankings.  Suppose  for  example  that  two  persons,  X  and  Y, 
rank  the  same  set  of  five  objects,  a,  b,  c,  d,  e.  Suppose  that  X  ranks  them  in  the 

order  c,  e,  a,  d,  b  and  that  Y  ranks  them  in  the  order  a,  c,  e,  b,  d.  We  regard  one  of  the 
rankings  (it  does  not  matter  which)  as  standard  and  compare  the  other  ranking  with  it. 

If  we  take  X's  ranking  as  the  standard,  the  situation  can  be  exhibited  as  follows: 

Object:  c  e  a  d  b 

X’s  ranking:  12345 

Y’s  ranking:  2  3  15  4 

The  rest  of  the  procedure  is  as  before.  Here  let  us  list  and  count  the  number  of 
inverted  pairs  in  Y’s  ranking.  They  are  21,  31,  54;  hence  Q  -  3.  Since  the  number  of 
objects  ranked  is  n  =  5,  we  find  from  (2)  that  P+3=&*5*4=10,  or  P  =  7,  from 
which  it  follows  from  (1)  that  ST  =  7  ~  3  =  4. 

4.  Kendall’s  Coefficient  of  Rank  Correlation  (r).  It  is  easily  seen  that  the 
score  S  discussed  above,  is  dependent  upon  n.  For  this  reason  it  is  a  somewhat  vague 
measure  of  the  agreement  of  one  ranking  with  another.  Thus,  for  example,  if  four  objects 
are  being  ranked,  the  maximum  value  that  ST  can  have  is  6.  This  would  indicate  perfect 
agreement  in  ranking.  On  the  other  hand,  a  value  of  6  for  ST  might  indicate  very  poor 
agreement  if  a  larger  number  of  objects  were  being  ranked. 

Consequently  it  is  desirable  to  have  a  coefficient  which  will  have  the  value  +  1 
when  two  rankings  are  in  perfect  agreement  and  the  value  ~  1  when  one  ranking  is  exactly 
the  reverse  of  the  other.  Such  a  coefficient  is  Kendall’s  coefficient  of  rank  correla¬ 
tion, 


T 


Sr 

Yin  (n  -  1) 


(3) 


(The  denominator  is  the  maximum  value  that  ST  can  have  in  the  case  of  n  ranks. )  Equiva¬ 
lent  expressions  for  r  are  the  following: 


r  -  - 

&n  (n  -  1) 


-  1, 


(4) 


r  =  1  -  -  2-Q - — 

'An  (n  -  1) 


(5) 


where  P  and  Q  have  been  defined  earlier.  In  the  above  example  r  has  the  value  2/5  -  0.4. 
The  coefficient  r  is  a  measure  of  the  correlation  or  agreement  of  any  ranking  with 
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a  standard  ranking  or  of  two  rankings  with  each  other.  It  may  also  be  used  to  measure 
the  correlation  between  two  characteristics  when  the  same  set  of  objects  has  been  ranked 
with  respect  to  both  of  these  characteristics.  For  example,  suppose  that  a  group  of  ten 
men  have  been  ranked  with  respect  to  initiative  and  also  with  respect  to  reliability. 

The  value  of  r  can  of  course  be  calculated;  it  measures  the  correlation  between  the  two 
traits  or  characteristics  in  this  group  of  ten  men. 

5.  Spearman’s  Coefficient  of  Rank  Correlation  (p).  Since  it  is  frequently  referred 
to,  we  shall  discuss  briefly  at  this  point  another  coefficient.  This  is  Spearman’s 
coefficient  of  rank  correlation.  It  is  designated  by  p  and  is  calculated  as  in  the  fol¬ 
lowing  example. 


Consider  the  rankings 

used 

in  a 

previous  illustration: 

Obj ect: 

a 

6 

c 

d  e 

X’s  ranking; 

3 

5 

1 

4  2 

Y’s  ranking; 

1 

4 

2 

5  3 

Difference,  d 

2 

1 

-  1 

-  1  ~  1 

dP 

4 

1 

1 

1  1 

If  the  sum  of  squares  of  the  differences  is  denoted  by  S  dP,  then  Spearman’s  coefficient 
is  defined  by  the  equation 


P  ~  1  " 


6  2dP 


(6) 


n(n  +  1) (n  ~  1) 

in  which  n,  as  usual,  denotes  the  number  of  objects  ranked.  In  the  present  example, 


p  =  1  -6-2-8 -  =  i  =  0  g 

5X6X4  5 


For  the  same  set  of  rankings  the  value  of  r  was  calculated  to  be  0.4.  In  general 
the  values  of  r  and  p  will  not  be  the  same.  Spearman’s  coefficient  is  merely  the  ordinary 
Ftearsonian  coefficient  of  correlation  between  two  rankings.  Like  Kendall’s  coefficient, 
it  has  the  value  +  1  when  the  rankings  are  in  perfect  agreement  and  the  value  -  1  when  the 
rankings  are  in  perfect  disagreement,  that  is,  when  one  ranking  is  exactly  the  reverse  of 
the  other.  On  the  whole,  Kendall’s  coefficient  is  to  be  preferred. 

6.  Ties  in  Ranks.  It  is  readily  realized  that  ties  in  rank  may  sometimes  occur, 
since  two  or  more  objects  may  possess  a  certain  characteristic  in  exactly  the  same 
degree  or  in  indistinguishable  degrees.  In  the  case  of  ties  some  adjustments  are  neces¬ 
sary  in  dealing  with  correlation. 

In  the  first  place,  the  convention  which  we  shall  adopt  in  assigning  ranks  in  the 
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case  of  a  tie  is  an  averaging  process.  For  example,  if  it  is  impossible  to  distinguish 
between  the  objects  which  would  be  ranked  fourth  and  fifth,  the  average  rank  of  4  54will 
be  assigned  to  each.  If  four  objects  are  tied  for  ranks  2,  3,  4,  5,  the  rank  assigned 
to  each  will  be  the  average  (2  +  3  +  4  +  5)/4  =  3  54  (The  same  result  is  obtained  if  the 
first  and  last  ranks,  namely  2  and  5,  are  averaged. ) 

7.  Calculation  of  and  r  for  Tied  Ranks.  When  there  are  ties  in  ranks  it  becomes 
impossible  to  use  the  formulas  previously  given  for  ST  and  r.  Consequently,  new  defi¬ 
nitions  for  these  two  functions  will  be  given.  The  new  definitions  will  yield  the  same 

results  as  the  old  ones  for  the  case  in  which  there  are  no  ties. 

In  order  that  the  meanings  of  the  definitions  may  be  clear  let  us  consider  an 

example  of  tied  rankings.  Suppose  that  six  objects,  a,  b,  c,  d,  e,  f,  are  ranked  by 

X  and  Y  as  follows: 

X’s  ranking:  (c,  d  tied),  b,  (a,  e,  f  tied) 

Y’s  ranking:  d,  c,  (b,  f,  tied),  a,  e 

Thus  we  have 

Object:  a  b  c  d  e  f 

i  and  j:  123456 

.Y's  ranking:  5  3  1)41545  5 

Y’s  ranking:  5  3542  1  6  354 

The  row  labeled  ‘i  and  j’  gives  the  number  of  each  object.  It  is  placed  in  the  foregoing 
scheme  for  later  use  in  explanation  and  calculation.  Incidentally,  these  objects  may  be 
arranged  in  any  order.  Here,  since  they  are  represented  by  letters  of  the  alphabet,  it 
seemed  natural  to  arrange  them  in  alDhahet.ical  order. 

We  now  define  the  quantity  x-  ( i  >  j)  to  be  +  1  if  *'s  ranking  ot  the  ith  object 
is  less  than  his  ranking  of  the  jth  object,  -  1  if  his  ranking  of  the  ith  object  is 
greater  than  his  ranking  of  the  jth  object,  and  0  if  his  ranking  of  the  ith  object  is 
the  same  as  his  ranking  of  the  jth  object.  For  illustration  x12  ~  ~  1,  since  his  ranking 
of  object  number  1  (namely  5)  is  greater  than  his  ranking  of  object  number  2  (namely  3); 
*34  =  0,  since  X’s  rankings  of  the  3rd  and  4th  objects  are  the  same  (namely  154);  x3g  = 

+  1,  since  the  rank  1 54  is  less  than  the  rank  5. 

The  quantity  yi-  is  similarly  defined.  Then 
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It  will  be  noted  that  the  numerator  of  r  is  S_. 

T 

The  calculation  of  r  for  the  foregoing  example  is  shown  in  Table  2. 


TABLE  2 


l  J 

xij 

yi; 

x .  .  y  ■ . 

ij 

l  J 

XiJ 

' 

x  ■  .  y  ■ . 

ij 

12 

-  1 

-  l 

1 

26 

1 

0 

0 

13 

-  1 

-  l 

1 

34 

0 

-  1 

0 

14 

-  1 

-  l 

1 

35 

1 

1 

1 

15 

0 

l 

0 

36 

1 

1 

1 

16 

0 

-  l 

0 

45 

1 

1 

1 

23 

- 1 

-  l 

1 

46 

1 

1 

1 

24 

■-  1 

-  l 

1 

56 

0 

-  1 

0 

.25 

1 

l 

1 

Total 

(entire 

table) 

10 

r 

readily  seen 

that 

2 

*2  .  = 

11,  2  y2. . 

J  ij 

=  U,  2  ,,t 

,  =  io  ( 

=  sT> 

and,  from  (8), 

r  =  — -!£ -  =0.81. 

4TT  4IT 

As  was  stated  earlier,  if  there  are  no  ties  in  ranks  the  value  of  Sr  calculated 
according  to  the  new  definition  will  be  identical  with  that  calculated  according  to 
the  first  definition.  Moreover,  when  there  are  no  ties,  both  2  x and  2  x ? .  reduce  to 
Vin  (n  -  1),  so  that  the  product  of  the  square  roots  of  these  quantities  has  this  value, 
and  we  are  led  to  formula  (3).  The  reader  will  find  it  instructive  to  calculate  ST  and 
r  for  the  example  of  section  3,  employing  the  new  method. 

8.  Alternative  Formula  for  r.  As  was  stated  in  the  preceding  section,  when  there 
are  no  ties,  both  2  x 3.  and  2  y?.  {£  >  j)  reduce  to  &n  (n  -  1).  This  is  readily  seen, 
since  in  this  case  each  and  y..  will  be  either  +  1  or  -  1.  However,  if  a  tie  of  t 

i] 

objects  exists  in  X’s  rankings,  then  each  corresponding  xi .  will  be  0.  Now  for  t  objects 
there  are  (since  we  are  taking  i  >  j)  exactly  14*  (t  -  1)  such  values  of  xi..  Thus,  the 
total  number  of  zero  values  for  xi .  is  14  2  t  (t  -  1).  For  instance,  if  2  objects  are 
tied  in  rank,  also  3  others,  then 
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ft  2  t  (t-1)  =P  2X  1  +  )ix  3X  2  -  4 


We  may  therefore  write 

2  x\j  ~  ftn  (i»  -  1)  -  ft  2  t  (t  -  1)  (9) 

Similarly,  if  we  use  u  to  denote  ties  in  l"s  ranking,  we  have  the  corresponding  formula 

2  y*..  =  ftn  (n  -  1)  -»Ia  (u  -  1).  (10) 

Consequently  the  denominator  in  (8)  may  be  replaced  by  the  product  of  the  square  roots 
of  the  right-hand  members  of  (9)  and  (10). 


SECTION  II 

SIGNIFICANCE  TESTS  FOR  RANK  CORRELATION 

9.  Significance  Tests  for  Rank  Correlation.  Let  us  consider  a  set  of  objects 
which  possess  a  certain  characteristic  or  quality  in  different  degrees,  so  that  they 
have  an  inherent  ranking.  Suppose  that  a  person  attempts  to  rank  them.  He  will  make 
a  certain  score  ST.  What  information  does  this  score  yield  concerning  the  ability  of 
the  person  to  judge  this  particular  quality?  Might  he  not  have  achieved  a  score  this 
high  or  higher  simply  by  ranking  the  objects  at  random? 

If  two  persons  are  ranking  the  same  set  of  objects,  does  a  certain  score  Sr  really 
indicate  that  they  are  in  substantial  agreement  or  might  not  the  score,  or  a  higher  one, 
have  occurred  purely  as  a  matter  of  chance? 

If  a  certain  value  has  been  found  for  the  correlation  coefficient  v,  can  we  con¬ 
clude  that  there  is  actually  some  correlation  between  the  two  characteristics  being 
investigated?  Perhaps  the  value  is  so  large  that  correlation  is  unmistakably  indicated 
for  this  given  set  of  objects.  However,  this  set  may  be  regarded  as  a  sample  from  a 
larger  ‘population’  of  similar  objects.  Another  sample  would  doubtless  yield  a  different 
value  of  r.  Therefore,  can  the  value  actually  found  be  interpreted  as  indicating  the 
existence  of  correlation  in  the  population,  or  might  not  a  value  this  large  or  larger 
happen  fairly  often  in  samples? 
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Questions  such  as  the  foregoing  suggest  the  desirability  of  having  some  means  of 
testing  how  unusual  an  observed  score  or  coefficient  is.  If  a  certain  score  is  unusual, 
we  may  say  that  it  is  significant,  meaning  that  its  value  is  decidedly  different  from 
what  is  to  be  expected  as  a  matter  of  chance.  A  test  which  tells  how  unusual  an  ob¬ 
served  value  is,  is  called  a  significance  test. 

The  following  considerations  may  throw  some  light  on  the  meaning  of  a  significance 
test  as  well  as  showing  how  such  a  test  may  sometimes  be  devised. 

For  four  objects,  a,  b,  c,  d,  there  are  24  possible  rankings.  (For  n  objects  there 
are  n!  possible  rankings.)  These  24  possibilities,  together  with  the  corresponding 
scores,  are  shown  in  Table  3. 


TABLE  3 


Ranking 

Ranking 

K 

Ranking 

5 

T 

1234 

6 

2314 

2 

3412 

-  2 

1243 

4 

2341 

0 

3421 

-  4 

1324 

4 

2413 

0 

4123 

0 

1342 

2 

2431 

-  2 

4132 

-  2 

1423 

2 

3124 

2 

4213 

-  2 

1432 

0 

3142 

0 

4231 

-  4 

2134 

4 

3214 

0 

4312 

-  4 

2143 

2 

3241 

-  2 

4321 

-  6 

The  information  given  in 

Table  3  is 

summarized 

in  Table  4. 

In  the  first  column 

of  the  latter  are  listed 

the  various  values  of  ST. 

In  the  second  column  are  listed  the 

corresponding  values  of 

r.  We 

find  from 

(3),  since 

here  n  =  4, 

that  each  value  of  r  is 

1/6  of  the  corresponding  value  of  S  . 

T 

The  third  column  gives  the  frequency  of  occurrence  of\the  values  of  ST  and  r.  In  the 
fourth  column  these  frequencies  have  been  converted  to  probabilities  by  division  by  24. 
Thus,  for  example,  the  0.125  on  the  same  line  with  the  -  4  in  the  Sr  column  means  that 
there  are  125  chances  in  1,000  (that  is,  1  chance  in  8)  of  obtaining  a  score  of  -  4 
(or  a  value  of  r  equal  to  -  0.67)  if  four  objects  are  ranked  by  some  purely  random  pro¬ 
cess. 

The  final  column  is  the  cumulative  probability,  that  is,  the  probability  of  obtain¬ 
ing  a  value  of  ST  or  r  as  large  as  or  larger  than  that  shown  in  the  same  line.  For  example, 
the  probability  of  obtaining  a  value  of  ST  equal  to  or  greater  than  2  (or  a  value  of  r 


WADC  TR  52-32 


8 


TABLE  4 


Sr 

T 

Frequency 

Probability 

Cum.  Prob. 

-  6 

~  1 

1 

0.042 

1.000 

-  4 

-  0.67 

3 

0. 125 

0.958 

-  2 

-  0.33 

5 

0.  208 

0.  833 

0 

0 

6 

0.250 

0.625 

2 

0.33 

5 

0.  208 

0.375 

4 

0.67 

3 

0.125 

0.167 

6 

1 

1 

0.  042 

0.042 

Total 

24 

1.000 

■  greater 

than  0. 33) 

is  0.  375. 

The  probability 

of  a  value  o 

greater  than  2  in  absolute  value  (or  a  value  of  r  equal  to  or  greater  than  0.33  in 
absolute  value)  is  0.375  +  0.375,  or  0.750. 

10.  Meaning  of  a  Significance  Test.  The  average  score  and  the  average  value  of  r 
in  Table  4  are  0.  This  is  to  be  expected,  since  a  person  possessing  no  ability  to  judge 
a  certain  characteristic  would  obtain  positive  and  negative  scores  having  the  same 
numerical  value  with  about  equal  frequency.  Likewise,  if  two  persons  are  ranking  the 
same  set  of  things  and  each  is  performing  the  ranking  by  some  random  process,  they  will 
be  in  disagreement  just  about  as  often  as  they  are  in  agreement.  Furthermore,  if  samples 
are  taken  from  a  population  in  which  no  correlation  exists  between  two  characteristics, 
then  it  would  seem  rather  reasonable  to  find  that  the  value  of  either  ST  or  r  turns  out 
to  be  0  on  the  average. 

The  more  that  a  value  of  ST  or  r  deviates  from  0,  the  less  likely  is  this  value  to 
occur.  If  the  probability  of  obtaining  a  score  equal  to  or  greater  than  a  specified 
value  Sr  is  p,  then  Sr  is  said  to  be  significant  at  the  level  p.  Similarly,  if  the 
probability  of  obtaining  a  correlation  coefficient  equal  to  or  greater  than  r  is  p, 
then  t  is  said  to  be  significant  at  the  level  p.  Since  the  values  of  ST  and  r  are  sym¬ 
metrically  distributed,  when  a  value  of  either  is  significant  at  the  level  p,  then  the 
corresponding  absolute  value  is  significant  at  the  level  2 p.  Thus,  from  Table  4,  it 


is  seen  that  the  significance  level  of  the  value  6  for  Sr  (or  of  1  for  r)  is  0.042,  or 
4.2%.  The  significance  level  of  |ST|  =  6  (or  of  |r|  =  1)  is  twice  4.2%,  or  8.4%. 

A  significance  test  involving  absolute  values  is  often  called  a  two-sided  test, 
one  involving  algebraic  values  is  called  a  one-sided  test.  Care  must  be  taken  to  note 
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which  kind  of  test  is  being  used. 

The  significance  levels  most  frequently  used  in  practice  are  5%  and  l%.  It  is 
customary  to  say  that  a  value  which  is  significant  at  the  5%  level  is  significant ,  and 
that  a  value  which  is  significant  at  the  1%  level  is  highly  significant.  These  levels 
and  these  terms  are  entirely  arbitrary,  however. 

11.  Tables  for  Testing  Significance  of  Rank  Correlation.  Tables  such  as  Table  4, 
showing  the  frequency  or  probability  distribution  of  ST  and  rare  easily  cbnstructed. 
However,  it  is  possible  to  make  use  of  existing  tables  such  as  those  found  in  [l ] f  vol.  1, 
pages  404-405;  [2],  page  141;  and  [3],  pages  620-621.  The  last  reference  gives  the 
distribution  of  Q,  the  number  of  inversions  in  rank,  which  is  quite  equivalent  to  the 
distribution  of  ST. 

The  tables  referred  to  above  extend  only  as  far  as  n  =  10,  that  is,  they  can  be 
used  only  in  the  case  of  10  or  fewer  rankings.  It  can  be  shown  that,  as  n  increases, 
the  distribution  of  Sr  approaches  a  normal  distribution  with  mean  0  and  variance 

n  (n  -  1) (2n  +  5)/18.  (11) 

When  n  is  greater  than  10  we  assume  that  ST  is  distributed  in  this  way  and  make  use  of 
a  table  of  the  normal  probability  integral.  When  the  normal  distribution  is  used  in 
testing  a  value  of  ST  we  make  a  so-called  correction  for  continuity,  which  consists  in 
subtracting  1  from  Sr. 

As  an  illustration,  suppose  that  a  value  of  Sr  -  55  has  been  obtained  in  the 
ranking  of  14  objects.  According  to  (11)  the  variance  of  -S^s  14  x  13  *  33/18  -  333.67. 
The  standard  deviation  of  Sr  is  the  square  root  of  this  value,  namely  18. 3.  We  make  the 
correction  for  continuity  and  calculate  the  quantity  *  =  (55  -  1)/18.3  =  2.95.  Consid¬ 
ering  this  value  as  a  normal  deviate  with  unit  standard  deviation,  we  find,  from  tables 
of  the  normal  probability  integral,  that  the  probability  of  an  absolute  value  this  large 
or  larger  is  0.0032.  That  is,  such  values  would  happen,  as  a  matter  of  pure  chance, 
only  about  32  times  out  of  10,000.  The  value  therefore  is  very  significant.  Stating 
the  matter  in  a  slightly  different  form,  if  we  hypothesize  that  the  ranking  of  these  14 
objects  was  done  by  a  purely  random  process,  then  a  value  of  ST  as  large  as  or  larger 
than  the  one  observed,  namely  55,  would  cause  us  to  reject  the  hypothesis. 

It  may  be  noted  at  this  point  that  the  distribution  of  r  also  tends  to  normality 
with  increasing  n. 

*  Numbers  in  square  brackets  refer  to  the  corresponding  numbers  in  the  Bibliography  at  the  end  of 
this  report. 
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12.  Significance  Tests  for  Rank  Correlation  When  There  Are  Tied  Ranks.  According 
to  Kendall  ([2],  page  43).  the  distribution  of  r  for  any  fixed  number  of  ties  approaches 
normality  with  increasing  n,  and  it  is  usually  permissible  to  employ  the  normal  approxi¬ 
mation  when  n  >  10,  although  when  many  ties  exist  a  special  consideration  may  be  neces¬ 
sary.  For  n  <  10,  the  tables  of  Sillito  [4]  will  be  found  useful. 

13.  Tests  When  Correlation  Exists  in  the  Population.  It  has  been  pointed  out 
(section  9)  that  when  a  measure  of  the  rank  correlation  between  two  characteristics  has 
been  calculated  for  a  given  set  of  objects,  this  set  may  be  regarded  as  a  sample  from 

a  'population’  of  similar  objects.  This  population  will  have  a  value  of  the  rank  correla¬ 
tion  coefficient,  let  us  call  it  T,  which  may  or  may  not  be  zero.  We  may  wish  to  test 
whether  the  value  of  r  observed  in  a  sample  deviates  significantly  from  the  population 
value  T. 

For  definiteness  let  us  consider  a  population  of  five  objects  having  inherent 
rankings  according  to  two  different  characteristics.  Suppose  that  when  the  objects  are 
arranged  according  to  these  characteristics  the  situation  is  as  follows. 

Rank  according  to  1st  characteristic:  12345 

Rank  according  to  2nd  characteristic:  15324 

Suppose  that  we  take  a  sample  of  three  from  this  population,  for  example, 

2  3  5 

5  3  4 

In  this  sample  the  value  of  S  is  found  to  be  ~  1  and  the  value  of  r  is  -  1/3.  Now 

T 

the  algebra  of  combinations  tells  us  that  the  number  of  possible  samples  of  three  from 
this  population  of  five  is  10.  Since,  in  any  sample,  the  ranking  according  to  the 
first  characteristic  will  always  be  in  natural  order,  it  is  necessary  to  consider  the 
ranking  according,  to  the  second  characteristic  only.  In  Table  5  the  10  possible  samples 
are  listed  (second  ranking  only),  together  with  the  corresponding  values  of  Sr  and  r. 

The  mean  value  of  r  is  readily  found  to  be  r  =  1/5.  The  value  of  the  population 
correlation  coefficient,  T,  is  also  1/5,  and  it  can  be  shown  that  in  general  the  mean 
value  of  t  in  samples  is  always  equal  to  T.  A  similar  statement  cannot  be  made  above 
the  average  value  of  Sr  in  samples, however.  Nor  can  much  be  said  about  the  variance  of 
r  in  samples  except  that  it  can  never  exceed 

2  (1  -  T*)/n  (12) 

where  n  is  the  number  of  rankings  in  the  sample.  Practically  all  that  can  be  done  is 
to  assume  that  r  is  normally  distributed  with  mean  T  and  with  standard  deviation  equal 
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TABLE  5 


Sample 

5 

T 

r 

Sampl e 

T 

153 

1 

1/3 

124 

3 

1 

152 

1 

1/3 

532 

-  3 

-  1 

154 

1 

1/3 

534 

-  1 

-  1/3 

132 

1 

1/3 

524 

-  1 

-  1/3 

134 

3 

1 

324 

1 

1/3 

to  the  square  root  of  the  expression  (10).  This  gives  a  conservative  test  in  the  sense 
that  will  be  explained  alter  an  example.  If  n  is  less  than  10  there  seems  to  be  no  good 
test  available. 

As  an  illustration  of  testing  the  significance  of  an  observed  value  of  Kendall’s 
coefficient,  let  us  suppose  that  the  value  r  =  0.  82  has  been  calculated  from  a  sample  of 
15.  Can  the  sample  be  regarded  as  having  come  from  a  population  in  which  the  correla¬ 
tion  coefficient  is  T  3  0.50? 

Using  (12)  we  calculate  the  maximum  value  of  the  variance  of  r  to  be  2  (1  -  0.25)/15 
=  0. 10.  The  corresponding  maximum  value  of  the  standard  deviation  of  r  is  or  -  'Jo.  10  - 
0.316.  Next  we  calculate  the  quantity 

*  =  t  ~  T  -  0.82  ~  0.50  =  x  01 
ctt  0.316 

Regarding  this  value  as  a  normal  deviate,  we  find,  upon  consulting  a  table  of  the  normal 
probability  integral,  that  the  probability  of  a  deviation  this  large  or  larger  numerically 
is  0.312.  Thus  we  cannot  reject  the  hypothesis  that  the  sample  came  from  a  population 
having  T  -  0.  50. 

Now  the  value  which  we  have  used  for  a  ,  namely  0.316,  is  the  maximum  value  which 
ctt  can  have.  It  might  be  smaller  ’than  this,  in  which  case  our  value  of  x  would  have  been 
larger.  Conceivably  it  could  have  been  large  enough  to  have  caused  us  to  reject  the  hy¬ 
pothesis  that  7  =  0.50.  Thus,  using  the  maximum  value  of  cr  will  never  cause  significance 
to  be  indicated  oftener  than  it  should  be,  and  in  this  sense  it  is  conservative. 
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SECTION  III 
RANK  CONCORDANCE 


14.  Concordance.  Up  to  this  point  we  have  considered  the  case  in  which  just  two 
rankings  are  involved.  It  is,  however,  desirable  to  have  some  measure  of  the  agreement 
in  rankings  when  there  are  several  persons  making  the  rankings. 

Suppose  then  that  there  are  m  rankings  of  n  objects.  To  make  the  matter  more  concrete 
let  us  consider  the  following  case  of  3  judges,  X,  Y,  and  Z,  who  have  ranked  5  objects,  a, 
b,  c,  d,  e. 

Object:  abode 

X’s  ranking:  41235 

y*s  ranking:  3  4  12  5 

Z’s  ranking:  14253 

Sum:  8  9  5  10  13 

We  have  given  not  only  the  rankings  but  the  sum  of  ranks.  It  can  be  shown  that  the 
'best’  estimate  of  the  ranks,  in  a  certain  least-squares  sense,  is  that  obtained  by 
ranking  the  objects  according  to  the  sum  of  the  ranks  assigned  to  them  by  the  judges.  In 
the  present  case  the  object  c  would  be  given  rank  1,  a  would  be  given  rank  2,  b  rank  3, 
d  rank  4,  and  e  rank  5. 

The  grand  total  of  the  sum  of  ranks  is  8  +  9  +  5  +  10  +  13  -  45,  and  the  mean  sum  is 
45/5  =  9.  In  general  the  grand  total  is  (n  +  1)  and  the  mean  sum  is  &m(n  +  1). 

If  there  were  complete  agreement  among  the  3  rankings  the  sums  would  be  3,  6,  9,  12, 

15  (although  not  necessarily  in  this  order).  In  general  the  sums  would  be  m,  2m, 
nm. 

Let  us  designate  by  5^,  the  sum  of  squares  of  deviations  from  the  mean  sum.  This  may 

be  taken  as  a  measure  of  the  agreement,  or  concordance,  among  the  rankings.  In  the 

example  under  consideration, 

=  (8  ~  9)2  +  (9  ~  9)2  +  (5  -  9)2  +  (10  -  9)2  +  (13  -  9)2  =  34 

In  the  general  case  the  maximum  value  that  can  have  is  m2  n  (n  +  l)(n  “  1)/12, 

which  is  the  value  it  assumes  when  the  agreement  is  perfect. 

15.  Coefficient  of  Concordance  (f).  We  now  define  I V,  the  coefficient  of  concordance. 
by  means  of  the  following  equation: 

12  S„ 

W  =  — - 1 - .  (13) 

m2  n  (n  +  1)  (n  ~  1) 
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This  coefficient  can  vary  in  value  between  0  and  1,  assuming  the  latter  value  when  the 
concordance  is  perfect.  In  the  present  example. 


Yi  = - 1.2.2L34 - 3  o.38. 

32  x  5  x  6  x  4 

Unlike  t,  Yi  cannot  assume  negative  values.  The  value  -  1  is  assumed  by  r  when  there  is 
complete  disagreement  between  2  rankings,  that  is,  when  one  ranking  is  exactly  the 
reverse  of  the  other.  Complete  disagreement  is  impossible,  however,  in  the  case  of  more 
than  2  rankings.  Thus,  if  X  and  Y  are  in  complete  disagreement,  Z  cannot  be  in  complete 
disagreement  with  both  of  them. 

16.  Relation  between  IF  and  Spearman’ s  Coefficient.  Suppose  that  Spearman’s  coeffi¬ 
cient  is  calculated  for  each  pair  of  rankings  and  the  average  of  the  values  obtained  is  k 
denoted  by  p,  then  it  can  be  shown  that 

p  -  I”-!-'-!,  (14) 


or 


W  =  On  -!)£'+  1_ 

m 


(15) 


SECTION  IV 

SIGNIFICANCE  TESTS  FOR  CONCORDANCE 

17.  Significance  Tests  for  Concordance.  Tests  of  significance  similar  to  those 
applied  to  S  can  also  be  applied  to  S„.  Tables  for  this  purpose  are  to  be  found  in 

T  Jr 

[l]  vol.  1  and  [2]. 

As  an  example,  suppose  that  a  value  ~  70  has  been  obtained  from  3  rankings  of 
5  objects.  In  [l],vol.  1,  page  415,  Table  16.8  or  [2],  page  149,  Appendix,  Table  5D, 
it  is  found  that  a  value  of  this  large  or  larger  has  a  probability  of  0.026.  This 
value  may  therefore  be  regarded  as  significant. 

Since  the  tables  referred  to  are  not  extensive  it  is  useful  to  have  methods  of 
testing  the  significance  of  concordance  for  larger  values  of  m  and  n.  One  such  method 
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consists  in  making  the  transformation 


F  =  - 1LI  ,  Vi=n-  1--2-,  ^=(*-1)14,  (16) 

1  rf  ffl 

and  using  tables  of  Snedecor’ s  F  (for  example,  those  to  be  found  in  [3])  with  vx  and  v2 
degrees  of  freedom.  In  this  case  a  correction  for  continuity  should  be  made.  This 
consists  in  subtracting  1  from  the  numerator  and  adding  2  to  the  denominator  of  the  fraction 
in 


*t2  (n 3  -  n)/12 

To  illustrate  the  method  we  shall  use  the  above  example,  although  naturally  if  the 
values  of  n  and  n  fall  within  the  range  covered  by  probability  tables  of  H these  tables 
should  be  used.  We  find 

If  -  - - 7<?  1 - =  0.75, 

9  x  120/12  +  2 

F  -  —(3-  a  g  qo 

1-0.75 

V,  =  5  -  1  “  -2-  =  3 1-  ,  v2  =  (3  -  1)  X  3-i  =  6^  . 

3  3  3  3 

Since  the  degrees  of  freedom,  vx  and  v2  are  fractional  we  must  use  two-way  interpolation 
in  a  table  of  F: 

Interpolation  for  5%  Point  of  F 


£ 

II 

to 

=  3  ^ 

Vi  =  4 

v2  -  6 

4.76 

4.683 

4.53 

V2  _  63- 

4.410 

tl 

-3 

4.35 

4.273 

4. 12 

Alternative  method 

(for  check) 

Vi  ^  3 

•=  33- 

Vi  =  4 

v2  3  6 

4.  76 

4.53 

V2  “  6^ 

4.487 

4.410 

4.257 

v2  =  7 

4.35 

4. 12 
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The  5%  point  is  4.41.  By  the  same  method,  the  1%  point  is  determined  to  be  8.69. 
The  value  F  -  6.00  (  or  Yi  -  0.75)  is  therefore  significant  at  the  5%  but  not  at  the  1% 
level. 

If  we  wish  to  interpolate  between  these  levels  we  have  the  following  set  up: 


F 

P 

4.41 

0.05 

6 

8.69 

0.01 

For  the  value  of  P  we  find  0.035  as  against  0.026  given  by  the  exact  method. 

For  n  >  7  we  may  use  the  chi-square  distribution  as  follows.  Set 

X2  =  *  (n  ~  1)  W,  (17) 

where  If  is  to  be  corrected  for  continuity.  The  expression  (17)  has  a  chi-square  distri¬ 
bution  with  n  -  1  degrees  of  freedom. 

Although  in  the  preceding  example  n  is  only  5  and  the  use  of  -x.2  is  not  justified  we 
shall  use  it  here  for  purposes  of  illustration. 

We  find 

■X2  =  3  x  (5  -  1)  x  0.75  =  9.00 

Using  a  table  of  x2  (for  example,  that  to  be  found  in  [3])  we  find  for  4  degrees  of  freedom 
that  this  value  is  not  significant  at  the  5%  level,  which  illustrates  the  use  of  the 
method  and  also  shows  that  it  is  unsafe  to  use  it  when  n  is  not  greater  than  7. 

18.  Tied  Ranks.  When  ties  occur  the  P-test  and  the  chi-square  test  require  no  modi¬ 
fication  unless  the  number  of  ties  is  large.  In  this  situation  the  test  is  complicated 
and  no  attempt  will  be  made  to  discuss  it  here.  The  interested  reader  is  referred  to  [l] 
and  [2]. 

19.  Relation  between  Concordance  and  Correlation.  The  score  Sr  and  Kendall’ s  coeffi¬ 
cient  r  are  meaningless  when  more  than  two  rankings  are  concerned.  However,  the  score 

Sjp  and  the  coefficient  of  concordance  V!  can  be  calculated  for  two  rankings,  that  is  for 
m  -  2,  just  as  well  as  for  any  other  values  of  m.  In  fact,  if  when  m  -  2  the  values  of 
Sr  are  calculated  for  two  given  rankings,  these  values  will  have  exactly  the  same  proba¬ 
bility  levels.  A  similar  statement  may  be  made  for  V!  and  r. 
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